Combating against Web Spam through Content Features
نویسندگان
چکیده
Web spamming refers to use of unethical search engine optimization practices to gain better position on Search Engine Result Page (SERP). Making judgment on web-page to declare it as spam or ham is complicated issue because different search engines have different standards. Link-based spamming, cloaking and content spamming is main focus of different anti spam techniques. Even though these anti-spam techniques have had much success, however, these techniques still face problems when combating against a new kind of spamming techniques. This paper presents a usage of different machine learning methods which provides a solution for supervised classification problem. We have used WEBSPAM-UK-2007 public data set and in our experiments. The final results are compared and analyzed with well known classifiers. The results show that Jrip and J48 perform well compared to other two methods.
منابع مشابه
Application of Machine Learning in Combating Web Spam
High ranking of a Web site in search engines can be directly correlated to high revenues nowadays. This amplifies the phenomenon of Web spamming which can be defined as preparing or manipulating any features of Web documents or hosts to mislead search engines’ ranking algorithms to gain undeservedly high position in search results. Web spam remarkably deteriorates the information quality availa...
متن کاملWeb Spam Detection
Definition Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its linkbased score), and cloakin...
متن کاملFeature Selection-model-based Content Analysis for Combating Web Spam
With the increasing growth of Internet and World Wide Web, information retrieval (IR) has attracted much attention in recent years. Quick, accurate and quality information mining is the core concern of successful search companies. Likewise, spammers try to manipulate IR system to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the spamming techniques of adversari...
متن کاملFighting Web Spam
High ranking of a Web site in search engines can be directly correlated to high revenues. This amplifies the phenomenon of Web spamming which can be defined as preparing or manipulating any features of Web documents or hosts to mislead search engines’ ranking algorithms to gain an undeservedly high position in search results. Web spam remarkably deteriorates the information quality available on...
متن کاملAn Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network
In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015